Variant Discovery    ◾    119

GCF_009858895.2_ASM985889v3_genomic.fna.fai

GCF_009858895.2_ASM985889v3_genomic.fna.pac

GCF_009858895.2_ASM985889v3_genomic.fna.sa

4.2.1.1.4  Aligning short reads of FASTQ files to the reference genome

Since we have downloaded the FASTQ files and the sequence of the reference genome, it

is time to map the reads to the reference genome and to create a SAM file for each sample.

Remember that since the reads are paired end, there are two files (forward and reverse) for

each run. This time we will use the following syntax of “bwa mem” command:

bwa mem -M -t 4 \

-R “@RG\tID:….\tSM:…..” \

ReferenceSequence \

file1.fastq file2.fastq > output.sam 2> output.log

The “-M” option is to mark shorter split hits as secondary, “-t” option specifies the num-

ber of threads to speed up the mapping, and “-R” to add a header line. For variant calling,

we may need to add or create a read group (RG) header to hold the sample information.

As input, we will provide the FASTA reference sequence and the forward FASTQ file and

reverse FASTQ file in the case of paired-end reads. The output can be directed to a SAM

file.

The above form is suitable for a single run. However, we may have multiple samples and

mapping the reads of a single-sample files at a time may be tedious. Instead, we can use a

bash script with a loop to align all runs. The following script creates the subdirectory “sam”

and then aligns reads of each sample to the reference genome and outputs a SAM file and a

log file. When you run the script, make sure that the main project directory is the working

directory.

mkdir sam

cd fastq

for i in $(ls *.fastq | rev | cut -c 9- | rev | uniq);

do

bwa mem -M -t 4 \

-R “@RG\tID:${i}\tSM:${i}” \

../ref/GCF_009858895.2_ASM985889v3_genomic.fna \

${i}_1.fastq ${i}_2.fastq > \

../sam/${i}.sam 2> ../sam/${i}.log;

done

cd ..

To make sure that the read mapping process has been completed successfully, you can dis-

play the content of the “sam” directory with “ls sam”. You can also display the content of

any of the SAM files by using “less -S SamFileName”.